<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
  <meta http-equiv="content-type"
 content="text/html; charset=ISO-8859-1">
<!--This file created 26.03.2005 14:29 Uhr by Claris Home Page version 3.0-->
  <title>Loose and tight LAM/MPI Integration in SGE</title>
  <meta name="GENERATOR" content="Claris Home Page 3.0">
</head>
<body style="background-color: rgb(255, 255, 255);">
<p><font size="+1"><b>Topic:</b></font></p>
<p>Loose and tight integration of the LAM/MPI library into SGE.</p>
<p><font size="+1"><b>Author:</b></font></p>
<p>Reuti, <a href="mailto:reuti__at__staff.uni-marburg.de">reuti__at__staff.uni-marburg.de;</a>
Philipps-University of Marburg, Germany</p>
<p><font size="+1"><b>Version:</b></font></p>
<p>1.0a -- 2005-03-23 Initial release, comments and corrections are
welcome<br>
1.1 -- 2010-11-22 Final version<br>
</p>
<p><font size="+1"><b>Contents:</b></font></p>
<ul>
  <li>Prerequisites</li>
  <li>Loose integration using rsh</li>
  <li>Modification of the LAM-MPI source for qrsh startup</li>
  <li>Loose integration using qrsh</li>
  <li>Tight integration using qrsh</li>
  <li>Modification of the Startup for other Platforms</li>
  <li>References and Documents</li>
</ul>
<p><font size="+1"><b>Note:</b></font></p>
<p>This HOWTO complements the information contained in the man pages
of SGE and the Administration Guide.</p>
<p><font size="+1"><b>Acknowledgement:</b></font></p>
<p>Many thanks to Sean Dilda for pointing out a bug in the
lamd_wrapper and eliminating an ambiguity.</p>
<p>
</p>
<hr><br>
<div style="text-align: center;"><big><big
 style="color: rgb(255, 0, 0);"><big><big>!!!
Warning !!!</big></big></big></big><br>
<br>
</div>
LAM/MPI was superseded by the Open MPI project. Open MPI has a Tight
Integration into SGE built in since 1.2, which can by invoked when you <span
 style="font-style: italic;">./configure</span> Open MPI with
"--with-sge" for versions 1.3 and newer. Open MPI version 1.2 compiled
it in unconditionally. Please refer to the <a
 href="http://www.open-mpi.org/faq/?category=running#run-n1ge-or-sge">Open
MPI
FAQ</a> for details.<br>
<br>
It will work out-of-the-box with a defined parallel environment
where start- and stop_proc_args are both set to NONE in the to be used
PE (in the essence, the same PE can now be used for Open MPI and
MPICH2), and in the jobscript a plain <span style="font-style: italic;">mpiexec</span>
will discover automatically the granted slots and nodes without any
further options. Nevertheless, in case that there is more than one MPI
installation in a cluster available, the correct <span
 style="font-style: italic;">mpiexec</span> corresponding to the
compiled application must be used as usual.<br>
<br>
This document just stays here in case you come across a legacy
installation and have to fix or setup such an older version.<br>
<p>
</p>
<hr><br>
<font size="+1"><b>Prerequisites</b></font>
<blockquote><b>Configuration of SGE with qconf or the GUI &#8211;
Compilation of LAM/MPI &#8211; Cluster setup</b>
  <p>You should be familiar with modifying the SGE environment such as
the queue definitions and parallel environment (PE) configuration.
Additional information on the SGE parallel environment is available
from the "sge_pe" man pages (make sure the SGE man pages are defined in
your $MANPATH). You should also have access to the LAM/MPI source code,
and know how to build and install it. More information can be found in
the LAM/MPI Documentation (see <a href="#References%20and%20Documents">References

and
Documents</a>). For the "Loose integration using rsh" startup
method, a working (passwordless) login between the nodes in the cluster
is also required.<br>
  </p>
  <p><b>Target platform</b></p>
  <p>This Howto targets the LAM/MPI 7.1.1 on the Linux platform. Due to
the modifications made to the source code and introduced startup
sequence, these integrations using qrsh will not work on platforms,
where SGE will use the additional group feature for a proper shutdown
by default.<br>
  </p>
  <p><b>LAM/MPI</b></p>
  <p>The LAM/MPI library from the Indiana University (<a
 href="http://www.lam-mpi.org">http://www.lam-mpi.org</a>) is a MPI
implementation. In contrast to other implementations like MPICH (<a
 href="http://www-unix.mcs.anl.gov/mpi/mpich/">http://www-unix.mcs.anl.gov/mpi/mpich</a>),

this
one is using daemons on all the included nodes in a parallel job.
One advantage of such an approach is, that programs issuing many mpirun
commands during their execution need only one time a "conventional"
communication to the nodes to start the daemons. All further
communication is handled by the already running daemons, and so
additional mpiruns will startup faster.<br>
  </p>
  <p><b>Available startup methods in SGE</b></p>
  <p>For now there is no startup possibility in SGE, which allows a
program started on a node to vanish into daemon land and still be under
SGE control for correct accounting and proper shutdown of all slave
tasks. Therefore the loose integrations will rely on a shutdown of the
daemons by a conventional LAM/MPI command, while the tight integration
uses a 2-stage startup, which was introduced by Christopher Duncan for
former LAM/MPI versions [<a href="#[1]%20Chris%20sge-lam">1</a>].<br>
  </p>
  <p><b>Included setups and scripts</b></p>
  <p>All three possibilities to startup the LAM/MPI daemons introduced
here can be installed in parallel, as they have a unique PE and script
directory. This way you can play around with them, and decide which one
to install in your cluster. While walking through this Howto, you can
install all of them in a shared directory, e.g. your home directory.
For the final installation it could be put in the conventional place
like $SGE_ROOT. The scripts, which are only slight modifications of the
original scripts in the $SGE_ROOT/mpi directory, can be downloaded here
for your convenience (the inserted lines are commented) [<a
 href="#[2]%20Archive%20sge-lam-integration">2</a>].</p>
  <p><b>Reading of this document</b></p>
  <p>To avoid explaining the complete LAM/MPI behavior in each of the
possible startup methods, this document should be read from the
beginning onward, although you are e.g. only interested in the Tight
integration. Some of the special settings of the scripts and PEs will
explain itself, when the default behavior is known.</p>
</blockquote>
<p>
</p>
<hr><br>
<font size="+1"><b>Loose integration using rsh</b></font>
<blockquote><b>Characteristic of the Startup Method</b>
  <p>Using a simple (passwordless) rsh between the nodes, the daemons
on the slave nodes are started up.</p>
  <p>Since LAM/MPI 7.1.1 is SGE-aware, it will create a job specific
directory on the slave nodes of the job of the form
"lam-$USER@$HOSTNAME-sge-$JOB_ID-$SGE_TASK_ID". So they will be unique
for each job and not interfere, if two (or more) LAM/MPI jobs are
running on the same node. They will be located in 1. $TMPDIR or 2.
/tmp. As $TMPDIR is only set by SGE in this case on the headnode of the
job, it will go here into the SGE created and removed $TMPDIR, and to
the /tmp directory on all the slave nodes. The removal of these
directories on the slave nodes will take place in the stop_proc_args
script of the PE, when the <i>lamhalt</i> command is executed. In most
of the cases this will work as intended, whether the jobs are finished
by a normal end of the program or stopped with qdel. Don't change the
$TMPDIR to a different location in your jobscript, because the command
mpirun can't execute correctly, as it won't find any information of the
current setup. If you must change it for any reason, it must also be
changed in the start_proc_args and stop_proc_args scripts of the PE.</p>
  <p>As there is no environment on the slave nodes by SGE, this startup
will behave most like an interactive startup, and hence need the path
to the LAM/MPI binaries also on the slave nodes defined in any file,
which will be sourced during a non-interactive login.</p>
  <p><b>Hostfile a.k.a. Machinefile Format</b></p>
  <p>Although LAM/MPI supports a machinefile format, where each node is
suffixed by the number of slots to be used on this machine (i.e.
"node00 cpu=2"), the default format of the generated machinefile is
sufficient, where each machine is repeated times in this file, equal to
the slots to be used there.</p>
  <p><b>Setup of the PE in SGE</b></p>
  <p>We will name this PE "lam_loose_rsh" and set the following entries:</p>
  <blockquote>
    <pre>$ qconf -sp lam_loose_rsh<br>pe_name           lam_loose_rsh<br>queue_list        para21 para22<br>slots             4<br>user_lists        NONE<br>xuser_lists       NONE<br>start_proc_args   /home/reuti/lam_loose_rsh/startlam.sh $pe_hostfile<br>stop_proc_args    /home/reuti/lam_loose_rsh/stoplam.sh<br>allocation_rule   $round_robin<br>control_slaves    FALSE<br>job_is_first_task TRUE</pre>
  </blockquote>
  <p>This setup is for SGE 5.3, where the queues to be used are defined
in the PE. For SGE 6.x, the PE must be specified in the cluster queue,
in which it should be used in the entry "pe_list".</p>
  <p><b>Sample job execution</b></p>
  <p>Most likely you have already a MPI program based on LAM/MPI, which
is running successfully in interactive mode. Otherwise you will find a
sample program in the supplied archive to this Howto, which will run
until a qdel of the job, so that the process distribution to the nodes
can easily be checked. Let's take this job script:</p>
  <blockquote>
    <pre>$ cat tester_mpi.sh<br>#!/bin/sh<br>export PATH=/home/reuti/local/lam-7.1.1/bin:$PATH<br>      <br>mpirun C /home/reuti/mpihello</pre>
  </blockquote>
  <p>Specifying the option "C" to mpirun will use all of the slots
granted to this job, which is most likely what you want all of the
time. If all went right, we can see the job running, and also check the
involved nodes for the running jobs:</p>
  <blockquote>
    <pre>$ qsub -pe lam_loose_rsh 2 tester_mpi.sh<br>your job 10389 ("tester_mpi.sh") has been submitted<br>$ qstat<br>job-ID  prior name       user         state submit/start at     queue      master  ja-task-ID <br>---------------------------------------------------------------------------------------------<br>  10389     0 tester_mpi reuti        r     02/20/2005 15:25:31 para21     SLAVE          <br>  10389     0 tester_mpi reuti        r     02/20/2005 15:25:31 para22     MASTER<br>$ rsh node21 ps -e f -o pid,ppid,pgrp,command --cols=80<br>  PID  PPID  PGRP COMMAND<br>...<br> 8386     1  8386 /home/reuti/local/lam-7.1.1/bin/lamd -H 192.168.151.23 -P 4288<br> 8387  8386  8386  \_ /home/reuti/mpihello<br>$ rsh node22 ps -e f -o pid,ppid,pgrp,command --cols=80<br>  PID  PPID  PGRP COMMAND<br>...<br>24576     1 24576 /home/reuti/local/lam-7.1.1/bin/lamd -H 192.168.151.23 -P 4288<br>24581 24576 24576  \_ /home/reuti/mpihello</pre>
  </blockquote>
  <p>Using this setup, the issued qdel, which will execute the <i>lamhalt</i>
in the stop_proc_args script of the PE, is the only way to remove the
running processes and created directories on the nodes (except the
directory on the headnode of the job, which is removed by SGE as usual).</p>
  <p>Another more sophisticated solution, maybe in "pseudo code" is to
"rsh &lt;node&gt; 'ps some_jobid | grep process_leader | (cut
process_group) * -1 | xargs kill -9 --'" in a loop over all involved
machines. As we are interested in a Tight integration, we will no
further dig into this, and it's up to the reader to implement it.</p>
</blockquote>
<p>
</p>
<hr><br>
<font size="+1"><b>Modification of the LAM/MPI source for qrsh
startup</b></font>
<blockquote><b>Pitfall during startup of the Daemons</b>
  <p>The daemon on the nodes is started by the program <i>hboot</i> in
LAM/MPI, which will fork a child process, which in turn will be
replaced by the <i>lamd.</i> While this will work without any
modification on the headnode of the job, where <i>hboot</i> is started
locally without rsh, we will face a problem on the slave nodes: when
the parent task ends, SGE will discover this and kill the whole
processgroup of the job. This would also kill the just started <i>lamd</i>,
as
it is still in the same processgroup as <i>hboot</i> was before.</p>
  <p><b>Modification</b></p>
  <p>The solution to this problem is the command <i>setsid</i>, which
will enforce a new processgroup for the child task. This has to be
added in line 317 of <i>hboot.c</i> (which can be found in your
lam-7.1.1 sourcecode installation in the directory <i>tools/hboot</i>),
after
the child forks. The program section:</p>
  <blockquote>
    <pre>                else if (pid == 0) {            /* child */<br>      <br>                        if (ao_taken(ad, "s")) {</pre>
  </blockquote>
  <p>should become:</p>
  <blockquote>
    <pre>                else if (pid == 0) {            /* child */<br>                        setsid();<br>                        if (ao_taken(ad, "s")) {</pre>
  </blockquote>
  <p>Then the executable <i>hboot</i> must be build again with the
conventional <i>make</i> inside the lam-7.1.1 sourcecode installation,
and the resulting binary <i>hboot</i> copied just by hand to the
place, where the binaries were installed, e.g. <i>~/local/lam-7.1.1/bin</i>.</p>
  <p><b>No warranty</b></p>
  <p>As usual, there is no guaranty, that this will work successfully
on all Linux distributions and versions. Just do on your own risk.</p>
</blockquote>
<p>
</p>
<hr><br>
<font size="+1"><b>Loose integration using qrsh</b></font>
<blockquote><b>Different behavior on the slave nodes compared to "Loose
integration using rsh"</b>
  <p>Because we are now using qrsh to start the daemons, no rsh would
be necessary between the nodes. It is just still defined here to have
an easy access to the nodes. The used qrsh will now also create a
temporary directory on the slave nodes and set the $TMPDIR accordingly.
Unfortunately, this will be deleted when the job (i.e. the executed <i>hboot</i>)
ends.
So it can't be used to store the LAM/MPI created information of
the daemon setup. As $TMPDIR is the only environment variable which
can't be exported to the nodes by using the "-v" or "-V" option to
qrsh, we have to use another way. Otherwise you will see the job
starting as intended, and the <i>lamhalt</i> will also look
successful, but you will still have the processes running on the slave
nodes in case that you issued a qdel, as LAM/MPI stores in its special
directory the PIDs of the processes, which need to be killed.</p>
  <p>The supplied rsh-wrapper in the mpi subdirectory in $SGE_ROOT
already has the option to supply a prefix command, which should be
executed before the command on the slave node. This can be set in the
start_proc_args script of the PE:</p>
  <blockquote>
    <pre>export RCMD_PREFIX="export TMPDIR=/tmp;"</pre>
  </blockquote>
  <p>This way, the LAM/MPI created directories are in the same place as
using just the plain rsh. Be sure, that you also changed the path to
the actual location of your rsh-wrapper in the supplied script
startlam.sh of the archive.</p>
  <p><b>Setup of the PE in SGE</b></p>
  <p>We will name this PE "lam_loose_qrsh" and set the following
entries:</p>
  <blockquote>
    <pre>$ qconf -sp lam_loose_qrsh<br>pe_name           lam_loose_qrsh<br>queue_list        para21 para22<br>slots             4<br>user_lists        NONE<br>xuser_lists       NONE<br>start_proc_args   /home/reuti/lam_loose_qrsh/startlam.sh -catch_rsh $pe_hostfile<br>stop_proc_args    /home/reuti/lam_loose_qrsh/stoplam.sh<br>allocation_rule   $round_robin<br>control_slaves    TRUE<br>job_is_first_task TRUE</pre>
  </blockquote>
  <p>To allow the execution of the qrsh commands, the control_slaves
entry must be set to TRUE in this case. On the headnode of the job is
no qrsh necessary, so we can safely set job_is_first_task to TRUE.</p>
  <p><b>Behavior on platforms with process control by the Additional
Group Feature</b></p>
  <p>Although the started <i>lamd</i> would be in a new process group,
the reliable shutdown of all the processes using the additional group
feature (see: man setgroups) by SGE would also kill the just started
daemon, as it can't escape from this additional group.</p>
  <p><b>Disadvantages and Advantages of the Loose Integration</b></p>
  <p>Which ever of the two loose integrations you choose, they share
most of the characteristics in common, as the only difference is the
usage of qrsh instead of the plain rsh:</p>
  <ul>
    <li>- wrong accounting</li>
    <li>- no processes/directories controlled by SGE, removal relies on
the <i>lamhalt</i> command</li>
  </ul>
  <p>If all went okay, the result in case of a normal program stop or
an abort with qdel is:</p>
  <ul>
    <li>+ no semaphores or shared memory segments left behind</li>
    <li>+ processes and directories removed on the slave nodes</li>
  </ul>
  <p>As the output is exactly the same as with the "Loose Integration
using rsh", we skip it here, because it will not present any new
information or facts. The scripts for this startup method are of course
also included in the supplied script archive to this Howto.</p>
</blockquote>
<p>
</p>
<hr><br>
<font size="+1"><b>Tight integration using qrsh</b></font>
<blockquote><b>2-stage Startup for Tight Integration</b>
  <p>For now, there is no direct possibility in SGE to start a program
in daemon-mode on a node, which is still under control of SGE for
correct accounting and removal of the processes.</p>
  <p>The original idea by Christopher Duncan in [<a
 href="#[1]%20Chris%20sge-lam">1</a>], was to make first a conventional
qrsh to the slave node, and on this slave node a local-qrsh to start
the final daemon. This is possible with LAM/MPI, as the daemon is not
pushing itself into the background (like it is mostly done with
daemonizing programs), but the starting <i>hboot</i> makes the
forking. At this step, it is possible to insert the local-qrsh call and
achieve the desired effect.</p>
  <p><b>Additional changes to LAM/MPI</b></p>
  <p>For this to work, it is necessary to use the already patched <i>hboot</i>,
like
in the chapter "Loose Integration using qrsh". Another change is
now necessary to insert the local-qrsh. Therefore the following steps
must be executed:</p>
  <ul>
    <li>move to your binary installation of LAM/MPI, e.g.
~/local/lam-7.1.1/bin</li>
    <li>rename <b>lamd</b> to <b>lamd_binary</b></li>
    <li>create a file <b>lamd_wrapper</b> with the following content:
      <blockquote>
        <pre>#!/bin/sh<br>#<br># Modified startup of lamd to get it working in a controlled<br># setup by SGE. To be used for interactive usage like usual.<br>#<br>         <br>LAMD_BINARY=lamd_binary<br>         <br>if [ "$ENVIRONMENT" = "BATCH" -a "$PE" = "lam_tight_qrsh" ] ; then<br>    COUNTER=5<br>    while (( COUNTER-- &gt; 0 )) ; do<br>        if (( `ps --no-headers -o ppid -p $$` != 1 )) ; then<br>            :<br>        else<br>            :<br>            exec $SGE_ROOT/bin/$ARC/qrsh -V -inherit -nostdin $HOSTNAME $LAMD_BINARY "$@"<br>        fi<br>    done<br>else<br>    exec $LAMD_BINARY "$@"<br>fi<br>         <br>#<br># If we are lucky, we never get here.<br>#<br>         <br>exit -1</pre>
      </blockquote>
    </li>
    <li>make the <b>lamd_wrapper</b> executable, i.e.: chmod +x
lamd_wrapper</li>
    <li>make a symbolic link from <b>lamd</b> to <b>lamd_wrapper</b>,
i.e.: ln -s lamd_wrapper lamd</li>
  </ul>
  <p>With this script, the installed LAM/MPI can still be used
interactively or with the loose integration scripts. So nothing must be
changed back, to get the initial behavior.</p>
  <p><b>Setup of the PE in SGE</b></p>
  <p>A new PE "lam_tight_qrsh" will reflect the changes for this tight
integration:</p>
  <blockquote>
    <pre>$ qconf -sp lam_tight_qrsh<br>pe_name           lam_tight_qrsh<br>queue_list        para21 para22<br>slots             4<br>user_lists        NONE<br>xuser_lists       NONE<br>start_proc_args   /home/reuti/lam_tight_qrsh/startlam.sh -catch_rsh $pe_hostfile<br>stop_proc_args    /home/reuti/lam_tight_qrsh/stoplam.sh<br>allocation_rule   $round_robin<br>control_slaves    TRUE<br>job_is_first_task FALSE</pre>
  </blockquote>
  <p>As we now make a qrsh also on the headnode of the parallel job,
the job_is_first_task must be set to FALSE. It may happen, that we get
only one slot on this machine.</p>
  <p><b>Program flow of the wrapper script</b></p>
  <p>SGE will control the number of qrsh calls made to each node, so
the idea is to wait a small time, until the parent PID of this
lamd_wrapper became 1. This will happen, when the starting <i>hboot</i>
left already the machine.</p>
  <p>[Side note: In case, that there are some race conditions, and SGE
still prevents the second local-qrsh to be made (although the parent
PID became already 1), the allocation_rule could be changed to be just
2 (and so limited to dual CPU machines); this would give the exact
behavior as Christopher Duncan's initial Perl script. In this case
there is no a need to wait at all, and the entry job_is_first_task
could be changed to TRUE (one local qrsh is at least allowed in this
case).]</p>
  <p>On some slow systems, there is the chance to increase the number
in the wait loop to a higher value, although it wasn't observed also
under already loaded machines that the waiting loop in the lamd_wrapper
was ever executed. It's just in the script, to avoid an endless loop,
in case anything went wrong.</p>
  <p>Again, the startlam.mpi needs to be adjusted, so that the actual
location of the rsh wrapper is used here.</p>
  <p><b>Sample execution</b></p>
  <p>Using the same commands like in the first example, the results
show the tight integration of LAM/MPI:</p>
  <blockquote>
    <pre>$ qsub -pe lam_tight_qrsh 2 tester_mpi.sh<br>your job 10395 ("tester_mpi.sh") has been submitted<br>$ qstat<br>job-ID  prior name       user         state submit/start at     queue      master  ja-task-ID <br>---------------------------------------------------------------------------------------------<br>  10395     0 tester_mpi reuti        r     02/20/2005 18:39:25 para21     MASTER         <br>            0 tester_mpi reuti        r     02/20/2005 18:39:25 para21     SLAVE          <br>  10395     0 tester_mpi reuti        r     02/20/2005 18:39:25 para22     SLAVE<br>$ rsh node21 ps -e f -o pid,ppid,pgrp,command --cols=80<br>  PID  PPID  PGRP COMMAND<br>...<br>  789     1   789 /usr/sge/bin/glinux/sge_execd<br> 8945   789  8945  \_ sge_shepherd-10395 -bg<br> 8993  8945  8993  |   \_ /bin/sh /var/spool/sge/node21/job_scripts/10395<br> 8994  8993  8993  |       \_ mpirun C /home/reuti/mpihello<br> 8972   789  8972  \_ sge_shepherd-10395 -bg<br> 8973  8972  8973      \_ /usr/sge/utilbin/glinux/rshd -l<br> 8975  8973  8975          \_ /usr/sge/utilbin/glinux/qrsh_starter /var/spool/sg<br> 8976  8975  8976              \_ lamd_binary -H 192.168.151.22 -P 55513 -n 0 -o<br> 8995  8976  8976                  \_ /home/reuti/mpihello<br> 8971     1  8971 /usr/sge/bin/glinux/qrsh -V -inherit -nostdin node21 lamd_bina<br> 8974  8971  8971  \_ /usr/sge/utilbin/glinux/rsh -n -p 55517 node21 exec '/usr/<br>$ rsh node22 ps -e f -o pid,ppid,pgrp,command --cols=80<br>  PID  PPID  PGRP COMMAND<br>...<br>  789     1   789 /usr/sge/bin/glinux/sge_execd<br>25180   789 25180  \_ sge_shepherd-10395 -bg<br>25181 25180 25181      \_ /usr/sge/utilbin/glinux/rshd -l<br>25183 25181 25183          \_ /usr/sge/utilbin/glinux/qrsh_starter /var/spool/sg<br>25184 25183 25184              \_ lamd_binary -H 192.168.151.22 -P 55513 -n 1 -o<br>25185 25184 25184                  \_ /home/reuti/mpihello<br>25179     1 25179 /usr/sge/bin/glinux/qrsh -V -inherit -nostdin node22 lamd_bina<br>25182 25179 25179  \_ /usr/sge/utilbin/glinux/rsh -n -p 49142 node22 exec '/usr/</pre>
  </blockquote>
  <p>The escaped processes don't execute any time or resource consuming
program, and can just stay there out of control of SGE. The real
executing <i>lamd</i> and child processes are under SGE control, which
was the goal to be achieved by this setup. These processes will be
killed when they finish on their own, or after issuing a qdel. In the
latter case, it is not possible to get a proper shutdown by using <i>lamhalt</i>,
since
  <i>lamd</i> was already shutdown and the directory containing
the LAM-MPI specific information removed SGE.</p>
  <p><b>Advantages and Disadvantages of the Tight Integration</b></p>
  <p>With this setup we trade some advantages for the disadvantages:</p>
  <ul>
    <li>+ correct accounting</li>
    <li>+ processes/directories controlled by SGE, definitely removed
at the end of the job</li>
  </ul>
  <p>The drawback is in this case:</p>
  <ul>
    <li>- semaphores or shared memory segments may be left behind, if
the job was aborted with qdel</li>
  </ul>
  <p>Depending on the usage of the nodes, there may be a need to make
some cleanup of the semaphores or shared memory segments from time to
time, or in the stop_proc_args of the PE, in case there are often jobs
killed by qdel.</p>
</blockquote>
<p>
</p>
<hr><br>
<font size="+1"><b>Modification of the Startup for other
Platforms</b></font>
<blockquote>In case of the shutdown of the processes by the additional
group feature, the other way round may be successful: usage of the
"Tight integration with qrsh" and waiting at the end of <i>hboot</i>,
until the <i>lamd</i> is up and running, instead of the inserted <i>setsid</i>.
Although
the starting local-qrsh will be removed in this case (and so
not appear in a process listing like the above one), the started <i>lamd</i>
will continue to operate and service for the program execution.</blockquote>
<p>
</p>
<hr><br>
<a name="References and Documents"></a><font size="+1"><b>References
and Documents</b></font>
<blockquote><b>SGE-LAM Integration</b>
  <p><a name="[1] Chris sge-lam"></a>[1] Latest version of the script
sge-lam by Christopher Duncan. Because it can only be found in the
mailing list archive at SGE and LAM/MPI, it's attached here for
reference (<a href="sge-lam">sge-lam</a>). As there was a small bug
inside, this version was patched to remove any "-n" for the remote-qrsh
call; this was taken from a former version of this script, where it was
included.</p>
  <p><a name="[2] Archive sge-lam-integration"></a>[2] Archive with all
the scripts used in this Howto: <a
 href="sge-lam-integration-scripts.tgz">sge-lam-integration-scripts.tgz</a>.<br>
  </p>
  <p><b>LAM/MPI</b></p>
  <p>The implementation specific documentation can be found at the
LAM/MPI website on the page "Documentation" (<a
 href="http://www.lam-mpi.org/using/docs/">http://www.lam-mpi.org/using/docs/</a>).<br>
  </p>
  <p><b>MPI documentation in general and tutorials</b></p>
  <p>For some general introduction to MPI and MPI-Programming, you can
study the following documents:</p>
  <ul>
    <li><a href="http://www.mpi-forum.org/docs">http://www.mpi-forum.org/docs</a></li>
    <li><a
 href="http://www.netlib.org/utk/papers/mpi-book/mpi-book.html">http://www.netlib.org/utk/papers/mpi-book/mpi-book.html</a></li>
    <li><a href="http://www-unix.mcs.anl.gov/mpi/usingmpi/index.html">http://www-unix.mcs.anl.gov/mpi/usingmpi/index.html</a></li>
    <li><a href="http://www-unix.mcs.anl.gov/mpi/usingmpi2">http://www-unix.mcs.anl.gov/mpi/usingmpi2</a></li>
    <li><a href="ftp://math.usfca.edu/pub/MPI/mpi.guide.ps">ftp://math.usfca.edu/pub/MPI/mpi.guide.ps</a></li>
  </ul>
</blockquote>
</body>
</html>
